Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game

نویسندگان

  • Matt Oberdorfer
  • Matt Abuzalaf
چکیده

We present the first reinforcement-learning model to self-improve its reward-modulated training implemented through a continuously improving “intuition” neural network. An agent was trained how to play the arcade video game Pong with two reward-based alternatives, one where the paddle was placed randomly during training, and a second where the paddle was simultaneously trained on three additional neural networks such that it could develop a sense of “certainty” as to how probable its own predicted paddle position will be to return the ball. If the agent was less than 95% certain to return the ball, the policy used an intuition neural network to place the paddle. We trained both architectures for an equivalent number of epochs and tested learning performance by letting the trained programs play against a near-perfect opponent. Through this, we found that the reinforcement learning model that uses an intuition neural network for placing the paddle during reward training quickly overtakes the simple architecture in its ability to outplay the near-perfect opponent, additionally outscoring that opponent by an increasingly wide margin after additional epochs of training.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک شبکـه عصبی MLP به منظور پیش‌بینی یخبندان در استـان کرمانشـاه

This study, with the help of minimum temperature data, has addressed the prediction of frost during 21 years period by means of neural network in Kermanshah province. In order to forecast frost, data were converted to the values between 0 and 1 by means of a subjective and one to one (injective) function. We have used feed-forward neural network by one hidden interior layer with number of chang...

متن کامل

Traffic Signal Prediction Using Elman Neural Network and Particle Swarm Optimization

Prediction of traffic is very crucial for its management. Because of human involvement in the generation of this phenomenon, traffic signal is normally accompanied by noise and high levels of non-stationarity. Therefore, traffic signal prediction as one of the important subjects of study has attracted researchers’ interests. In this study, a combinatorial approach is proposed for traffic signal...

متن کامل

RTDGPS Implementation by Online Prediction of GPS Position Components Error Using GA-ANN Model

If both Reference Station (RS) and navigational device in Differential Global Positioning System (DGPS) receive signals from the same satellite, RS Position Components Error (RPCE) can be used to compensate for navigational device error. This research used hybrid method for RPCE prediction which was collected by a low-cost GPS receiver. It is a combination of Genetic Algorithm (GA) computing an...

متن کامل

Signal Prediction by Layered Feed - Forward Neural Network (RESEARCH NOTE).

In this paper a nonparametric neural network (NN) technique for prediction of future values of a signal based on its past history is presented. This approach bypasses modeling, identification, and parameter estimation phases that are required by conventional parametric techniques. A multi-layer feed forward NN is employed. It develops an internal model of the signal through a training operation...

متن کامل

Global Solar Radiation Prediction for Makurdi, Nigeria Using Feed Forward Backward Propagation Neural Network

The optimum design of solar energy systems strongly depends on the accuracy of  solar radiation data. However, the availability of accurate solar radiation data is undermined by the high cost of measuring equipment or non-functional ones. This study developed a feed-forward backpropagation artificial neural network model for prediction of global solar radiation in Makurdi, Nigeria (7.7322  N lo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1609.07434  شماره 

صفحات  -

تاریخ انتشار 2016